Max-margin Latent Dirichlet Allocation for Image Classification and Annotation

نویسندگان

Yang Wang

Greg Mori

چکیده

Much work in image classification and labeling uses topic models (e.g. LDA [1]), which are a class of powerful tools originally proposed in text modeling and have gained much popularity in computer vision recently. Despite the success of topic models in visual recognition, we believe there are some limitations of the way that topic models are used in computer vision. First of all, most topic models unsupervised. This means the topics discovered by topic models are not necessarily the ones used for discriminative tasks, such as image classification. To address this issue, several supervised variants of topic models have been developed. But the limitation of those models is that most of them assume the “bag-of-words” image representation, i.e. an image is represented by a collection of unordered feature descriptors computed from small local patches. Although the “bag-of-words” representation has been proven successful, other more holistic image representations (e.g. GIST [2]) have been shown to be powerful in many applications too. It is desirable to have the best of both worlds and design a model that can exploit both types of feature representation. In this paper, we propose the max-margin latent Dirichlet allocation (MMLDA), a variant of MedLDA [3]. We introduce two different versions of MMLDA, called MMLDAc for image classification, and MMLDAa for image annotation. MMLDAc is based on MedLDA. The main difference is that MedLDA only uses the latent topics as the feature vector for classification, while MMLDAc uses latent topics together with any other image features. This extension allows MMLDAc to make use of image features (e.g. GIST) that cannot be easily represented as bagof-words. MMLDAa is an extension of MMLDAc for image annotation. In image annotation, the goal is to choose a set of annotation terms (also called tags) to describe an image. Since an image can be associated with more than one tag, image classification is a multi-label classification. In MMLDAa, various tags are implicitly coupled by the latent topics defined in the model. Training MMLDAa results in topic representations that are suitable for predicting those tags. MMLDAc: We use x to denote an image. We use w to denote the bag-ofwords representation of x, e.g. w can be obtained by vector-quantization of SIFT descriptors. The topic assignment of the words in the document is denoted by z. We assume a linear discriminative function of the form F(y,z,w,x,η) = η> y f (z,w,x). Note the definition of F(·) is similar to that in MedLDA. In fact, if we assume f (z,w,x) = z̄ = 1 N ∑ N n=1 zn, we can recover F(·) in MedLDA. So the definition of F(·) in MMLDA is a strict generalization of that in MedLDA. One important thing to remember is that since z is not observed, f (z,w,x) is actually a random vector implicitly defined by the distribution on Z. We assume f (z,w,x) is a concatenation of two sub-vectors f (z,w,x)= cat(z̄;g(w,x)), where g(w,x) is a vector defined on w and x, z̄ is defined as z̄ = 1 N ∑ N n=1 zn similar to sLDA and MedLDA, cat(a;b) denotes the concatenation of two vectors a and b. Notice that we do not have any assumption on the form of g(w,x), it can be any feature vector extracted from the image, e.g. histogram of words, GIST descriptors, or both. Similarly, we assume ηy is also a concatenation of two sub-vectors ηy = cat(ζy;νy), so that η> y f (z,w,x) = ζy >z̄ + νy>g(w,x). Fig. 1 (a) shows a graphical illustration of MMLDAc. Similar to MedLDA, we learn the model parameter by solving an optimization problem as follows:

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Tile-Level Annotation of Satellite Images Using Multi-Level Max-Margin Discriminative Random Field

This paper proposes a multi-level max-margin discriminative analysis (MDA) framework, which takes both coarse and fine semantics into consideration, for the annotation of high-resolution satellite images. In order to generate more discriminative topic-level features, the MDA uses the maximum entropy discrimination latent Dirichlet Allocation (MedLDA) model. Moreover, for improving the spatial c...

متن کامل

Automatic Music Annotation

In the last ten years, computer-based systems have been developed to automatically classify music according to a high-level musical concept such as genre or instrumentation. These automatic music annotation systems are useful for the storage and retrieval of music from a large database of musical content. In general, a system begins by extracting features for each song. The labels and features ...

متن کامل

Multi-Modal Image Annotation with Multi-Label Multi-Instance LDA

This paper studies the problem of image annotation in a multi-modal setting where both visual and textual information are available. We propose Multimodal Multi-instance Multi-label Latent Dirichlet Allocation (M3LDA), where the model consists of a visual-label part, a textual-label part and a labeltopic part. The basic idea is that the topic decided by the visual information and the topic deci...

متن کامل

MedLDA: maximum margin supervised topic models

A supervised topic model can use side information such as ratings or labels associated with documents or images to discover more predictive low dimensional topical representations of the data. However, existing supervised topic models predominantly employ likelihood-driven objective functions for learning and inference, leaving the popular and potentially powerful max-margin principle unexploit...

متن کامل

Automatic Image Annotation and Retrieval Using the Latent Dirichlet Allocation Model

Content-based image retrieval faces a vital problem, namely “semantic gap” that exists between low level features and semantic concept. In order to solve this problem, image automatic annotations that allow users to access a large image database with textual queries are put forward. In this paper, the main study concentrates on an automatic image annotation method based on vector quantization (...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2011

Max-margin Latent Dirichlet Allocation for Image Classification and Annotation

نویسندگان

چکیده

منابع مشابه

Tile-Level Annotation of Satellite Images Using Multi-Level Max-Margin Discriminative Random Field

Automatic Music Annotation

Multi-Modal Image Annotation with Multi-Label Multi-Instance LDA

MedLDA: maximum margin supervised topic models

Automatic Image Annotation and Retrieval Using the Latent Dirichlet Allocation Model

عنوان ژورنال:

اشتراک گذاری